125 research outputs found

    On Counting Triangles through Edge Sampling in Large Dynamic Graphs

    Full text link
    Traditional frameworks for dynamic graphs have relied on processing only the stream of edges added into or deleted from an evolving graph, but not any additional related information such as the degrees or neighbor lists of nodes incident to the edges. In this paper, we propose a new edge sampling framework for big-graph analytics in dynamic graphs which enhances the traditional model by enabling the use of additional related information. To demonstrate the advantages of this framework, we present a new sampling algorithm, called Edge Sample and Discard (ESD). It generates an unbiased estimate of the total number of triangles, which can be continuously updated in response to both edge additions and deletions. We provide a comparative analysis of the performance of ESD against two current state-of-the-art algorithms in terms of accuracy and complexity. The results of the experiments performed on real graphs show that, with the help of the neighborhood information of the sampled edges, the accuracy achieved by our algorithm is substantially better. We also characterize the impact of properties of the graph on the performance of our algorithm by testing on several Barabasi-Albert graphs.Comment: A short version of this article appeared in Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2017

    Crude incidence in two-phase designs in the presence of competing risks.

    Get PDF
    BackgroundIn many studies, some information might not be available for the whole cohort, some covariates, or even the outcome, might be ascertained in selected subsamples. These studies are part of a broad category termed two-phase studies. Common examples include the nested case-control and the case-cohort designs. For two-phase studies, appropriate weighted survival estimates have been derived; however, no estimator of cumulative incidence accounting for competing events has been proposed. This is relevant in the presence of multiple types of events, where estimation of event type specific quantities are needed for evaluating outcome.MethodsWe develop a non parametric estimator of the cumulative incidence function of events accounting for possible competing events. It handles a general sampling design by weights derived from the sampling probabilities. The variance is derived from the influence function of the subdistribution hazard.ResultsThe proposed method shows good performance in simulations. It is applied to estimate the crude incidence of relapse in childhood acute lymphoblastic leukemia in groups defined by a genotype not available for everyone in a cohort of nearly 2000 patients, where death due to toxicity acted as a competing event. In a second example the aim was to estimate engagement in care of a cohort of HIV patients in resource limited setting, where for some patients the outcome itself was missing due to lost to follow-up. A sampling based approach was used to identify outcome in a subsample of lost patients and to obtain a valid estimate of connection to care.ConclusionsA valid estimator for cumulative incidence of events accounting for competing risks under a general sampling design from an infinite target population is derived

    Kernel-based methods for combining information of several frame surveys

    Get PDF
    A sample selected from a single sampling frame may not represent adequatly the entire population. Multiple frame surveys are becoming increasingly used and popular among statistical agencies and private organizations, in particular in situations where several sampling frames may provide better coverage or can reduce sampling costs for estimating population quantities of interest. Auxiliary information available at the population level is often categorical in nature, so that incorporating categorical and continuous information can improve the efficiency of the method of estimation. Nonparametric regression methods represent a widely used and flexible estimation approach in the survey context. We propose a kernel regression estimator for dual frame surveys that can handle both continuous and categorical data. This methodology is extended to multiple frame surveys. We derive theoretical properties of the proposed methods and numerical experiments indicate that the proposed estimator perform well in practical settings under different scenarios.Ministerio de Economía y CompetitividadConsejería de Economía, Innovación, Ciencia y Emple

    Visualization of C. elegans transgenic arrays by GFP

    Get PDF
    BACKGROUND: Targeting the green fluorescent protein (GFP) via the E. coli lac repressor (LacI) to a specific DNA sequence, the lac operator (lacO), allows visualization of chromosomes in yeast and mammalian cells. In principle this method of visualization could be used for genetic mosaic analysis, which requires cell-autonomous markers that can be scored easily and at single cell resolution. The C. elegans lin-3 gene encodes an epidermal growth factor family (EGF) growth factor. lin-3 is expressed in the gonadal anchor cell and acts through LET-23 (transmembrane protein tyrosine kinase and ortholog of EGF receptor) to signal the vulval precursor cells to generate vulval tissue. lin-3 is expressed in the vulval cells later, and recent evidence raises the possibility that lin-3 acts in the vulval cells as a relay signal during vulval induction. It is thus of interest to test the site of action of lin-3 by mosaic analysis. RESULTS: We visualized transgenes in living C. elegans by targeting the green fluorescent protein (GFP) via the E. coli lac repressor (LacI) to a specific 256 sequence repeat of the lac operator (lacO) incorporated into transgenes. We engineered animals to express a nuclear-localized GFP-LacI fusion protein. C. elegans cells having a lacO transgene result in nuclear-localized bright spots (i.e., GFP-LacI bound to lacO). Cells with diffuse nuclear fluorescence correspond to unbound nuclear localized GFP-LacI. We detected chromosomes in living animals by chromosomally integrating the array of the lacO repeat sequence and visualizing the integrated transgene with GFP-LacI. This detection system can be applied to determine polyploidy as well as investigating chromosome segregation. To assess the GFP-LacI•lacO system as a marker for mosaic analysis, we conducted genetic mosaic analysis of the epidermal growth factor lin-3, expressed in the anchor cell. We establish that lin-3 acts in the anchor cell to induce vulva development, demonstrating this method's utility in detecting the presence of a transgene. CONCLUSION: The GFP-LacI•lacO transgene detection system works in C. elegans for visualization of chromosomes and extrachromosomal transgenes. It can be used as a marker for genetic mosaic analysis. The lacO repeat sequence as an extrachromosomal array becomes a valuable technique allowing rapid, accurate determination of spontaneous loss of the array, thereby allowing high-resolution mosaic analysis. The lin-3 gene is required in the anchor cell to induce the epidermal vulval precursors cells to undergo vulval development

    Transcriptional control in the prereplicative phase of T4 development

    Get PDF
    Control of transcription is crucial for correct gene expression and orderly development. For many years, bacteriophage T4 has provided a simple model system to investigate mechanisms that regulate this process. Development of T4 requires the transcription of early, middle and late RNAs. Because T4 does not encode its own RNA polymerase, it must redirect the polymerase of its host, E. coli, to the correct class of genes at the correct time. T4 accomplishes this through the action of phage-encoded factors. Here I review recent studies investigating the transcription of T4 prereplicative genes, which are expressed as early and middle transcripts. Early RNAs are generated immediately after infection from T4 promoters that contain excellent recognition sequences for host polymerase. Consequently, the early promoters compete extremely well with host promoters for the available polymerase. T4 early promoter activity is further enhanced by the action of the T4 Alt protein, a component of the phage head that is injected into E. coli along with the phage DNA. Alt modifies Arg265 on one of the two α subunits of RNA polymerase. Although work with host promoters predicts that this modification should decrease promoter activity, transcription from some T4 early promoters increases when RNA polymerase is modified by Alt. Transcription of T4 middle genes begins about 1 minute after infection and proceeds by two pathways: 1) extension of early transcripts into downstream middle genes and 2) activation of T4 middle promoters through a process called sigma appropriation. In this activation, the T4 co-activator AsiA binds to Region 4 of σ70, the specificity subunit of RNA polymerase. This binding dramatically remodels this portion of σ70, which then allows the T4 activator MotA to also interact with σ70. In addition, AsiA restructuring of σ70 prevents Region 4 from forming its normal contacts with the -35 region of promoter DNA, which in turn allows MotA to interact with its DNA binding site, a MotA box, centered at the -30 region of middle promoter DNA. T4 sigma appropriation reveals how a specific domain within RNA polymerase can be remolded and then exploited to alter promoter specificity

    Advances in estimation by the item sum technique using auxiliary information in complex surveys

    Get PDF
    To collect sensitive data, survey statisticians have designed many strategies to reduce nonresponse rates and social desirability response bias. In recent years, the item count technique (ICT) has gained considerable popularity and credibility as an alternative mode of indirect questioning survey, and several variants of this technique have been proposed as new needs and challenges arise. The item sum technique (IST), which was introduced by Chaudhuri and Christofides (2013) and Trappmann et al. (2014), is one such variant, used to estimate the mean of a sensitive quantitative variable. In this approach, sampled units are asked to respond to a two-list of items containing a sensitive question related to the study variable and various innocuous, nonsensitive, questions. To the best of our knowledge, very few theoretical and applied papers have addressed the IST. In this article, therefore, we present certain methodological advances as a contribution to appraising the use of the IST in real-world surveys. In particular, we employ a generic sampling design to examine the problem of how to improve the estimates of the sensitive mean when auxiliary information on the population under study is available and is used at the design and estimation stages. A Horvitz-Thompson type estimator and a calibration type estimator are proposed and their efficiency is evaluated by means of an extensive simulation study. Using simulation experiments, we show that estimates obtained by the IST are nearly equivalent to those obtained using “true data” and that in general they outperform the estimates provided by a competitive randomized response method. Moreover, the variance estimation may be considered satisfactory. These results open up new perspectives for academics, researchers and survey practitioners, and could justify the use of the IST as a valid alternative to traditional direct questioning survey modes.Ministerio de Economía y Competitividad of SpainMinisterio de Educacion, Cultura y Deporteproject PRIN-SURWE

    The epidemiology of enterococci

    Full text link
    The enterococci are emerging as a significant cause of nosocomial infections, accounting for approximately 10 % of hospital acquired infections. They are found as normal inhabitants of the human gastrointestinal tract, but may also colonize the oropharynx, vagina, perineal region and soft tissue wounds of asymtomatic patients. Until recently, evidence indicated that most enterococcal infections arose from patients' own endogenous flora. Recent studies, however, suggest that exogeneous acquisition may occur and that person-to-person spread, probably on the hands of medical personnel, may be a significant mode of transmission of resistant enterococci within the hospital. The use of broad-spectrum antibiotics, especially cephalosporins, is another major factor in the increasing incidence of enterococcal infections. These findings suggest that barrier precautions, as applied with other resistant nosocomial pathogens, along with more judicial use of antibiotics may be beneficial in preventing nosocomial spread of resistant enterococci.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/47899/1/10096_2005_Article_BF01963631.pd

    On the Reliability of Network Measurement Techniques Used for Malware Traffic Analysis

    No full text
    corecore